84 research outputs found

    Description et analyse des verbes désadjectivaux et dénominaux en -ifier et -iser

    Get PDF
    International audienceThis work aims at studying the productive process of morphological derivation which allows the creation of deadjectival and denominal verbs using -iser and -ifier suffixes in French. This process is analysed at the morphological, syntactic and semantic levels, both from the lexico- graphic point of view (extension of the morphological and syntactic lexicon Lefff ) and from the dynamic point of view (automatic detection and interpretation of neologisms created by derivation)

    Towards a (better) Definition of Annotated MIR Corpora

    No full text
    International audienceToday, annotated MIR corpora are provided by various re- search labs or companies, each one using its own annota- tion methodology, concept definitions, and formats. This is not an issue as such. However, the lack of descriptions of the methodology used--how the corpus was actually an- notated, and by whom--and of the annotated concepts, i.e. what is actually described, is a problem with respect to the sustainability, usability, and sharing of the corpora. Ex- perience shows that it is essential to define precisely how annotations are supplied and described. We propose here a survey and consolidation report on the nature of the an- notated corpora used and shared in MIR, with proposals for the axis against which corpora can be described so to enable effective comparison and the inherent influence this has on tasks performed using them

    Corpus Linguistics for the Annotation Manager

    No full text
    International audienceHand crafted annotated corpora are acknowledged as critical elements for the Human Language Technologies but systems have to be trained on domain specific data to achieve a high level of performance. This is the reason why numerous annotation campaigns are launched. The role of the annotation manager consists in designing the annotation protocol, sometimes selecting the source data, hiring the required number of annotators with the adequate competences, writing the annotation guidelines, controlling the annotation process and delivering the resulting annotated corpus with the expected quality. However, for a given task, the complexity of the annotation work seems to be highly dependent on the type of corpus to annotate. Since this affects both the cost and the quality of the annotation, it is an important issue to tackle for the annotation manager. This paper illustrates the role of corpus linguistics for the management of annotations through a specific annotation campaign. We show how the corpus characteristics affect all aspects of the annotation protocol: the design of the annotation guidelines, the selection of the a sub-corpus for training, the duration of the annotator's training, the complexity of the annotation formalism, the quality of the resulting annotation

    Vers une méthodologie d'annotation des entités nommées en corpus ?

    No full text
    National audienceToday, the named entity recognition task is considered as fundamental, but it involves some specific difficulties in terms of annotation. We list them here, with illustrations taken from manual annotation experiments in microbiology. Those issues lead us to ask the fun- damental question of what the annotators should annotate and, even more important, for which purpose. We thus identify the applications using named entity recognition and, according to the real needs of those applications, we propose to semantically define the elements to annotate. Finally, we put forward a number of methodological recommendations to ensure a coherent and reliable annotation scheme

    Introduction : Text and image in children's literature

    Get PDF
    Issue theme: Volume 1: Power and Authority in Text and Image: the educational and political dimension of children’s literaturePublisher PDFNon peer reviewe

    TCOF-POS : un corpus libre de français parlé annoté en morphosyntaxe

    Get PDF
    National audienceThis article details the creation of TCOF-POS, the first freely available corpus of spontaneous spoken French. We present here the methodology that was followed in order to obtain the best possible quality in the final resource. This corpus already is freely available and can be used as a training/validation corpus for NLP tools, as well as a study corpus for linguistic research. We also present the results obtained by two POS-taggers trained on the corpus

    Modeling the Complexity of Manual Annotation Tasks: a Grid of Analysis

    Get PDF
    International audienceManual corpus annotation is getting widely used in Natural Language Processing (NLP). While being recognized as a difficult task, no in-depth analysis of its complexity has been performed yet. We provide in this article a grid of analysis of the different complexity dimensions of an annotation task, which helps estimating beforehand the difficulties and cost of annotation campaigns. We observe the applicability of this grid on existing annotation campaigns and detail its application on a real-world example

    Rotary Drum Separator and Pump for the Sabatier Carbon Dioxide Reduction System

    Get PDF
    A trade study conducted in 2001 selected a rotary disk separator as the best candidate to meet the requirements for an International Space Station (ISS) Carbon Dioxide Reduction Assembly (CRA). The selected technology must provide micro-gravity gasfliquid separation and pump the liquid from 10 psia at the gasfliquid interface to 18 psia at the wastewater bus storage tank. The rotary disk concept, which has pedigree in other systems currently being built for installation on the ISS, failed to achieve the required pumping head within the allotted power. The separator discussed in this paper is a new design that was tested to determine compliance with performance requirements in the CRA. The drum separator and pump @SP) design is similar to the Oxygen Generator Assembly (OGA) Rotary Separator Accumulator (RSA) in that it has a rotating assembly inside a stationary housing driven by a integral internal motor. The innovation of the DSP is the drum shaped rotating assembly that acts as the accumulator and also pumps the liquid at much less power than its predecessors. In the CRA application, the separator will rotate at slow speed while accumulating water. Once full, the separator will increase speed to generate sufficient head to pump the water to the wastewater bus. A proof-of- concept (POC) separator has been designed, fabricated and tested to assess the separation efficiency and pumping head of the design. This proof-of-concept item was flown aboard the KC135 to evaluate the effectiveness of the separator in a microgravity environment. This separator design has exceeded all of the performance requirements. The next step in the separator development is to integrate it into the Sabatier Carbon Dioxide Reduction System. This will be done with the Sabatier Engineering Development Unit at the Johnson Space Center

    Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use

    Get PDF
    International audienceThis article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years. According to the mainstream opinion expressed in articles of the domain, this type of on-line working platforms allows to develop quickly all sorts of quality language resources, at a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal. Our goal here is manifold: 1- to inform researchers, so that they can make their own choices, 2- to develop alternatives with the help of funding agencies and scientific associations, 3- to propose practical and organizational solutions in order to improve language resources development, while limiting the risks of ethical and legal issues without letting go price or quality, 4- to introduce an Ethics and Big Data Charter for the documentation of language resourc

    Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellisé

    Get PDF
    International audienceThis article is a position paper concerning Amazon Mechanical Turk-like systems, the use of which has been steadily growing in natural language processing in the past few years. According to the mainstream opinion expressed in the articles of the domain, these online working platforms allow to develop very quickly all sorts of quality language resources, for a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal, be it from the point of view of quality, price, workers' status or ethics. We shall then bring back to mind already existing or proposed alternatives. Our goal here is twofold : to inform researchers, so that they can make their own choices with all the elements of the reflection in mind, and propose practical and organizational solutions in order to improve new language resources development, while limiting the risks of ethical and legal issues without letting go price or quality.Cet article est une prise de position concernant les plate-formes de type Amazon Mechanical Turk, dont l'utilisation est en plein essor depuis quelques années dans le traitement automatique des langues. Ces plateformes de travail en ligne permettent, selon le discours qui prévaut dans les articles du domaine, de faire développer toutes sortes de ressources linguistiques de qualité, pour un prix imbattable et en un temps très réduit, par des gens pour qui il s'agit d'un passe-temps. Nous allons ici démontrer que la situation est loin d'être aussi idéale, que ce soit sur le plan de la qualité, du prix, du statut des travailleurs ou de l'éthique. Nous rappellerons ensuite les solutions alternatives déjà existantes ou proposées. Notre but est ici double : informer les chercheurs, afin qu'ils fassent leur choix en toute connaissance de cause, et proposer des solutions pratiques et organisationnelles pour améliorer le développement de nouvelles ressources linguistiques en limitant les risques de dérives éthiques et légales, sans que cela se fasse au prix de leur coût ou de leur qualité
    corecore